Classification with Support Vector Machines

Maximal Margin Classifier

Federalist Papers

A Simple(r) Classification

Suppose we are interested in classifying a new observation as “John Jay” or “Not John Jay”.

Choosing a Line

There are many lines we could draw that split the training data perfectly between John Jay and not John Jay!

I’m just drawing any line that separates the two groups. Is there a better way to construct these lines?

Equidistant Between Observations

How should we choose between these two lines???

Maximal Margin

The “best” line is the line with the largest margin or is furthest from the nearest observation on either side.

  • Where could I add a John Jay essay which would not change the line?
  • Where could I add a John Jay essay which would change the line?
  • How many observations control the shape of the line?

Difficulties with Hard Margins


It is rare to have observations that perfectly fall on either side of a line / hyperplane.


Adding one more observation could totally change our classification line!

A Different Comparison

Suppose we wanted instead to separate “Hamilton” from “Not Hamilton”…

Where should we draw our line?

Soft Margin

A soft margin is a margin with only a certain number of misclassified points.

There are two decisions to make here:


  1. How big is our margin?

(M = width of margin)

  1. How many misclassified observations are we willing to have?

(C = cost of a misclassified point)

An Initial Attempt

Width of margin: 2

How many points in the margin are misclassified?

A Second Attempt

Width of margin: 1

How many points in the margin are misclassified?

Support Vector Classifier

The support vector is the set of all observations that falling within the soft margin that are misclassified.


A support vector classifier tries to find:

a line / hyperplane that will be used to classify future observations …

… that give us the biggest margin width (M)…

… while still respecting the cost of missclasified points (C).

Fitting a Linear Support Vector Classifier with tidymodels


svm_spec <- svm_linear(cost = 3, margin = 0.5) %>%
  set_mode("classification") %>%
  set_engine("kernlab")

fed_recipe <- recipe(author_ah ~ PC1 + PC2, data = fed_pca_ah)

fed_wflow <- workflow() %>%
  add_model(svm_spec) %>%
  add_recipe(fed_recipe)

my_svm <- fed_wflow %>%
  fit(fed_pca_ah)

Install a New Package

You will need to install the kernlab package this week!

Inspecting the Model Fit

my_svm %>% 
  extract_fit_parsnip()
parsnip model object

Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 3 

Linear (vanilla) kernel function. 

Number of Support Vectors : 16 

Objective Function Value : -38.0067 
Training error : 0.057143 
Probability model included. 

Making Predictions

predict(my_svm, new_data = fed_pca_df)
# A tibble: 70 × 1
   .pred_class 
   <fct>       
 1 Hamilton    
 2 Not Hamilton
 3 Not Hamilton
 4 Not Hamilton
 5 Not Hamilton
 6 Not Hamilton
 7 Hamilton    
 8 Hamilton    
 9 Not Hamilton
10 Not Hamilton
# ℹ 60 more rows

Plotting Predictions

How well did the model do?

Model Metrics

predict(my_svm, new_data = fed_pca_ah) %>% 
  bind_cols(truth = fed_pca_ah$author_ah) %>% 
  rename(prediction = .pred_class) %>% 
  conf_mat(truth = truth, 
           estimate = prediction)
              Truth
Prediction     Hamilton Not Hamilton
  Hamilton           48            1
  Not Hamilton        3           18

Not Separable Groups

What if we simply couldn’t separate our data with a line / hyperplane?

Increasing the Dimensions

What if we imagine our points exist in three dimensions?

Support Vector Machine

A support vector machine is an extension of the support vector machine classifer that results from enlarging the feature space in a specific way, using kernels.

In this class, we will only implement polynomial kernals.

Try it!

Open 07-SVM.qmd to explore how SVM (and PCA) can be used to classify the class of different zoo animals.